Robust Similarity Measures for Named Entities Matching
نویسندگان
چکیده
Matching coreferent named entities without prior knowledge requires good similarity measures. Soft-TFIDF is a fine-grained measure which performs well in this task. We propose to enhance this kind of metrics, through a generic model in which measures may be mixed, and show experimentally the relevance of this approach.
منابع مشابه
Evaluation of Similarity Measures for Template Matching
Image matching is a critical process in various photogrammetry, computer vision and remote sensing applications such as image registration, 3D model reconstruction, change detection, image fusion, pattern recognition, autonomous navigation, and digital elevation model (DEM) generation and orientation. The primary goal of the image matching process is to establish the correspondence between two ...
متن کاملSemi-automatic Labeling of (Coreferent) Named Entities: An Experimental Study
In this paper, we investigate the problem of matching coreferent named entities extracted from text collections in a robust way: our longterm goal is to build similarity methods without (or with the minimum amount of) prior knowledge. In this framework, string similarity measures are the main tool at our disposal. Here we focus on the problem of evaluating such a task, especially in finding a m...
متن کاملRobust, Light-weight Approaches to compute Lexical Similarity
Most text processing systems need to compare lexical units – words, entities, semantic concepts – with each other as a basic processing step within large and complex systems. A significant amount of research has taken place in formulating and evaluating multiple similarity metrics, primarily between words. Often, such techniques are resourceintensive or are applicable only to specific use cases...
متن کاملChinese Entity Relation Extraction Based on Word Co-occurrence
Chinese entity relation extraction is a part of entity relation extraction. According to entity relation extraction technology and the features of Chinese news corpus, this paper proposes a novel method for Chinese entities relation extraction. The method, named WCORE (word co-occurrence relation extraction), first measures the semantic similarity by word co-occurrence and then adopts pattern m...
متن کاملMining Document Collections to Facilitate Accurate Approximate Entity Matching
Many entity extraction techniques leverage large reference entity tables to identify entities in documents. Often, an entity is referenced in document collections differently from that in the reference entity tables. Therefore, we study the problem of determining whether or not a substring “approximately” matches with a reference entity. Similarity measures which exploit the correlation between...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008